Categories

Versions

Subprocess (Caching) (Operator Toolbox)

Synopsis

This operator is a variation of the Subprocess operator that allows to cache the results of the subprocess to be reused in subsequent runs.

Description

The original Suprocess operator implements a process within a process. Whenever a Subprocess operator is reached during a process execution, first the entire subprocess is executed. Once the subprocess execution is complete, the flow is returned to the process (the parent process).

This operator implements the same logic, but allows to cache the outputs of the subprocess to skip its execution in subsequent runs. This feature aims at supporting the iterative design of processes that contain operators with long runtimes.

Please note that the caching functionality is only available in RapidMiner Studio. If deployed to RapidMiner Server, the operator behaves the same as the original Subprocess operator regardless of the configuration of the cache.

Input

  • input (IOObject)

    The Subprocess operator can have multiple inputs. When one input is connected, another input port becomes available which is ready to accept another input (if any). The order of inputs remains the same. The Object supplied at the first input port of the subprocess is available at the first input port of the nested chain (inside the subprocess).

Output

  • output (IOObject)

    The Subprocess operator can have multiple outputs. When one output is connected, another output port becomes available which is ready to deliver another output (if any). The order of outputs remains the same. The Object delivered at the first output port of the subprocess is delivered at the first output of the outer process.

Parameters

  • caching_strategy

    Defines the caching strategy.

    • auto: The operator reruns the subprocess when it detects changes in the input or the embedded process. Please note that the underlying heuristic does not apply a full scan of the input data. Thus, it is possible that some changes remain undetected.
    • manual: The operator reruns the subprocess only after it is cleared manually. Changes of the input or the embedded process are ignored.
    • none: The operator always runs the subprocess and no cache is used.
    Range:
  • clear_cache

    Clears the cache associated with this operator. This parameter is only available if caching is enabled and if the cache contains results from a previous run.

    Range:

Tutorial Processes

Automatic caching

This tutorial process demonstrates the caching strategy 'auto' for embedded processes with constant and changing inputs.